Analysis and Optimization of the Hadoop Speculative Execution Mechanism

نویسندگان

XIANJIN LUO

Xianjin LUO

Chenggang ZHEN

چکیده

The existing Hadoop clusters are mostly composed of heterogeneous nodes, which have different computing and storage capacities, with the speed of maps to reduce tasks performed on the nodes being quite different. However, the finish time of the entire job is determined by the slowest task, so looking for the “drag tasks” strategy has a dominant position in the whole job scheduling process. The current speculative execution mechanism of Hadoop results in shortcomings to find the drag tasks in time. In this paper we try to improve the speculative execution mechanism, and then we apply the improved First-in-First-out (FIFO) scheduler to the Hadoop cluster. Experiments verify that the improved mechanism has better performance when the Hadoop platform has many drag tasks, where it can increase the cluster resource utilization and throughput.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimization Framework for Map Reduce Clusters on Hadoop’s Configuration

ARTICLE INFO Hadoop represents a Java-based distributed computing framework that is designed to support applications that are implemented via the MapReduce programming model. Hadoop performance however is significantly affected by the settings of the Hadoop configuration parameters. Unfortunately, manually tuning these parameters is very time-consuming. Existing system uses Random forest approa...

متن کامل

Speculation-aware Resource Allocation for Cluster Schedulers

Resource allocation and straggler mitigation (via “speculative” copies) are two key building blocks for analytics frameworks. Today, the two solutions are largely decoupled from each other, losing the opportunities of joint optimization. Resource allocation across jobs assumes that each job runs a fixed set of tasks, ignoring their need to dynamically run speculative copies for stragglers. Cons...

متن کامل

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

Estimation Accuracy on Execution Time of Run-Time Tasks in a Heterogeneous Distributed Environment

Distributed Computing has achieved tremendous development since cloud computing was proposed in 2006, and played a vital role promoting rapid growth of data collecting and analysis models, e.g., Internet of things, Cyber-Physical Systems, Big Data Analytics, etc. Hadoop has become a data convergence platform for sensor networks. As one of the core components, MapReduce facilitates allocating, p...

متن کامل

Budget based dynamic slot allocation for MapReduce clusters

MapReduce is one of the programming models for processing large amount of data in cloud where resource allocation is one of the research areas since it is responsible for improving the performance of Hadoop. However the resource allocation can be further improved by focusing on a set of mechanisms, that includes the budget based HFS algorithm where the fast worker node is identified first based...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Analysis and Optimization of the Hadoop Speculative Execution Mechanism

نویسندگان

چکیده

منابع مشابه

Optimization Framework for Map Reduce Clusters on Hadoop’s Configuration

Speculation-aware Resource Allocation for Cluster Schedulers

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Estimation Accuracy on Execution Time of Run-Time Tasks in a Heterogeneous Distributed Environment

Budget based dynamic slot allocation for MapReduce clusters

عنوان ژورنال:

اشتراک گذاری